AITopics | piecewise strong convexity

Piecewise Strong Convexity of Neural Networks

Neural Information Processing SystemsDec-25-2025, 21:41:13 GMT

We study the loss surface of a feed-forward neural network with ReLU non-linearities, regularized with weight decay. We show that the regularized loss function is piecewise strongly convex on an important open set which contains, under some conditions, all of its global minimizers. This is used to prove that local minima of the regularized loss function in this set are isolated, and that every differentiable critical point in this set is a local minimum, partially addressing an open problem given at the Conference on Learning Theory (COLT) 2015; our result is also applied to linear neural networks to show that with weight decay regularization, there are no non-zero critical points in a norm ball obtaining training error below a given threshold. We also include an experimental section where we validate our theoretical work and show that the regularized loss function is almost always piecewise strongly convex when restricted to stochastic gradient descent trajectories for three standard image classification problems.

name change, piecewise strong convexity, regularized loss function, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Reviews: Piecewise Strong Convexity of Neural Networks

Neural Information Processing SystemsJan-26-2025, 14:01:35 GMT

Originality: I am not convinced that the contributions of this paper are more significant than that of [1], which have been cited in this paper already. Specifically, in comparison with [1] in Line 82, the authors state that these conclusions apply to a smaller set in weight space. I would appreciate it if the authors could quantify the difference here and have a discussion section to show the comparison with some form of mathematical comparison. Further, there have been quite a few papers that show convergence of GD on neural networks using something like strong convexity. Clarity The paper is written quite clearly and it is easy enough to follow the paper.

local minima, minima, neural network, (12 more...)

Neural Information Processing Systems

Genre: Personal > Interview (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Reviews: Piecewise Strong Convexity of Neural Networks

Neural Information Processing SystemsJan-26-2025, 14:01:24 GMT

This paper shows that the quadratic loss with weight decay of deep ReLU networks is piecewise strongly convex on a nonempty open set where every critical point is a local minimum, and every local minimum is isolated. Initially the paper received mixed reviews, with two positive and one negative review. On the positive side, the contribution is found to be quite significant because it analyzes realistic networks (deep and non-linear). On the other hand, one reviewer had issues with the proof, and another with the experiments. The rebuttal addressed the issues raised by the reviewers, and the negative review updated the score.

neural network, piecewise strong convexity, reviewer, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Piecewise Strong Convexity of Neural Networks

Neural Information Processing SystemsOct-10-2024, 18:44:20 GMT

We study the loss surface of a feed-forward neural network with ReLU non-linearities, regularized with weight decay. We show that the regularized loss function is piecewise strongly convex on an important open set which contains, under some conditions, all of its global minimizers. This is used to prove that local minima of the regularized loss function in this set are isolated, and that every differentiable critical point in this set is a local minimum, partially addressing an open problem given at the Conference on Learning Theory (COLT) 2015; our result is also applied to linear neural networks to show that with weight decay regularization, there are no non-zero critical points in a norm ball obtaining training error below a given threshold. We also include an experimental section where we validate our theoretical work and show that the regularized loss function is almost always piecewise strongly convex when restricted to stochastic gradient descent trajectories for three standard image classification problems.

neural network, piecewise strong convexity, regularized loss function, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)

Add feedback

Piecewise Strong Convexity of Neural Networks

Milne, Tristan

Neural Information Processing SystemsMar-19-2020, 02:01:24 GMT

We study the loss surface of a feed-forward neural network with ReLU non-linearities, regularized with weight decay. We show that the regularized loss function is piecewise strongly convex on an important open set which contains, under some conditions, all of its global minimizers. This is used to prove that local minima of the regularized loss function in this set are isolated, and that every differentiable critical point in this set is a local minimum, partially addressing an open problem given at the Conference on Learning Theory (COLT) 2015; our result is also applied to linear neural networks to show that with weight decay regularization, there are no non-zero critical points in a norm ball obtaining training error below a given threshold. We also include an experimental section where we validate our theoretical work and show that the regularized loss function is almost always piecewise strongly convex when restricted to stochastic gradient descent trajectories for three standard image classification problems. Papers published at the Neural Information Processing Systems Conference.

neural network, piecewise strong convexity, regularized loss function, (1 more...)

Neural Information Processing Systems

Genre: Research Report (0.64)

Technology: